A REFINED HARMONIC LANCZOS BIDIAGONALIZATION 
METHOD AND AN IMPLICITLY RESTARTED ALGORITHM FOR 
COMPUTING THE SMALLEST SINGULAR TRIPLETS OF LARGE 



Abstract. The harmonic Lanczos bidiagonahzation method can be used to compute the small- 
est singular triplets of a large matrix A. We prove that for good enough projection subspaces 
harmonic Ritz values converge if the columns of A are strongly linearly independent. On the other 
hand, harmonic Ritz values may miss some desired singular values when the columns of A almost 
linearly dependent. Furthermore, harmonic Ritz vectors may converge irregularly and even may 
fail to converge. Based on the refined projection principle for large matrix eigenproblems due to 
the first author, we propose a refined harmonic Lanczos bidiagonahzation method that takes the 
Rayleigh quotients of the harmonic Ritz vectors as approximate singular values and extracts the 
best approximate singular vectors, called the refined harmonic Ritz approximations, from the given 
subspaces in the sense of residual minimizations. The refined approximations are shown to converge 
to the desired singular vectors once the subspaces are sufficiently good and the Rayleigh quotients 
converge. An implicitly restarted refined harmonic Lanczos bidiagonahzation algorithm (IRRHLB) 
is developed. We study how to select the best possible shifts, and suggest refined harmonic shifts that 
are theoretically better than the harmonic shifts used within the implicitly restarted Lanczos bidi- 
agonahzation algorithm (IRHLB). We propose a novel procedure that can numerically compute the 
refined harmonic shifts efficiently and accurately. Numerical experiments are reported that compare 
IRRHLB with five other algorithms based on the Lanczos bidiagonahzation process. It appears that 
IRRHLB is at least competitive with them and can be considerably more efficient when computing 
the smallest singular triplets. 
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1. Introduction. We assume that a large sparse matrix A £ 7?.*^^^, M > N 
has full column rank and let 



be its singular value decomposition (SVD) [21 [35], where U — (wi, 1*2, . . . , um) = 
{1/1,1/2) and V ~ (vi,V2, ■ ■ ■ ,vn) are Af x M and N x N orthogonal matrices, 
Ui = (ui, U2, ■ • ■ , un) and S = diag{ai, (T2, . . . , (Tat) is diagonal. ai,i = 1,2, . . . , N , 
are called the singular values of A, Ui's and u^'s are the associated left and right sin- 
gular vectors, respectively, and (ai,Ui,Vi)^s are called singular triplets. In this paper, 
slightly different from the convention, the singular values are labeled as cti < 172 < 

• • • < CTN- 

We are concerned with the following problem. 

Problem 1. Compute numerically the k smallest singular triplets {ai,Ui,Vi) of 
A, i ^ 1,2, . . . ,k, where fc < iV. 
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There are many applications of Problem 1 , including determination of numerical 
rank and of spectral condition number, least squares problems, total least squares 
problems, regression analysis, image and signal processing, pattern recognition and 
information retrieval, to name a few. 

Consider the {M + N) x (Af + A^) augmented matrix 

Then, the eigenvalues of A are just ±(Ti, . . . , itcrjv and M ~ N zeros, the associated 
eigenvectors of Ci and — dj are (^uj,vj) and (^uj,—vj) , respectively, and 

the eigenvectors associated with zero eigenvalues have the form (v7 , 0"'")"'", where it's 
are orthogonal to all mi, . . . ,ujv- Therefore, we obtain the following formulation of 
Problem 1. 

Problem 2. Compute numerically the k smallest positive eigenvalues and the 
associated eigenvectors of A. 

For the k smallest eigenpairs of A, Problem 2 is a symmetric interior eigenvalue 
problem. Since M and N are assumed to be large, we can only resort to projection 
methods. A typical method is the symmetric Lanczos method It and other 

standard projection methods usually favor the extreme eigenvalues and the associated 
eigenvectors but are generally very inefficient for computing interior eigenpairs [28] . 
Another drawback is that in finite precision the computed eigenvalues do not come 
in plus-and-minus pairs and the computed eigenvectors do not respect the special 
structures that the true eigenvectors have. 

Because of the mentioned drawbacks, we should not work on A explicitly for 
computing the smallest singular triplets of A. Instead we attempt to solve Problem 
1 directly by working on A implicitly. It appears that Lanczos bidiagonalization type 
methods [1 [m [H HOI [H [H [121 and Jacobi-Davidson SVD type methods [mill] can 
solve the mentioned problems elegantly. The Lanczos bidiagonalization type methods 
available have in common that they are all based on the Lanczos bidiagonalization 
process to build up orthonormal bases of certain Krylov subspaces. However, their 
mathematical backgrounds can be fundamentally different. Basically, there are three 
kinds of projection principles that extract different approximate singular triplets with 
respect to the subspaces. Some methods use the standard projection principle [H [281 
[31] to extract Ritz approximations J21[l[Tni[ni[Ill[inilll[lSl[S3], some methods use 
the harmonic projection principle [2 [32l [34] to extract harmonic Ritz approximations 
[31[31[II1[I11[11] and some methods use the refined projection principle [H [T31 [311 [M] to 
extract refined singular vector approximations [HI [IHl [H] . Jacobi-Davidson type SVD 
methods for Problem 1 have several versions that are based on the three projection 
principles as well as their generalizations, respectively. As observed and claimed in 
[3l I12j. the refined extraction version appears to give the best accuracy in general. 

For Problem 1, due to the storage requirement and computational cost, all the 
Lanczos bidiagonalization type methods as well as Jacobi-Davidson type methods 
have to be restarted generally in order to make them converge. That is, for given 
projection subspaces, if the methods do not converge, then one repeatedly chooses 
new better starting vectors, constructs better subspaces and computes new approx- 
imate singular triplets until they converge. The implicit restarting technique due to 
Sorensen [3D| is a powerful tool for restarting Krylov subspace algorithms in various 
contexts including large SVD problems [l[l[ni[Il[ini[llllll[lS]- The success of an 
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implicitly restarted algorithm heavily depends on both the underlying method itself 
and a proper selection of the shifts involved; see, e.g., [TSUSHI- Based on the Lanc- 
zos bidiagonalization method and one of its harmonic versions, Jia and Niu 20J and 
Larsen 24J have developed an implicitly restarted Lanczos bidiagonalization algorithm 
(IRLB), and Kokiopoulou et al. [22 have proposed an implicitly restarted harmonic 
Lanczos bidiagonalization algorithm (IRLANB) for computing the smallest singular 
triplets. IRLB uses the unwanted Ritz values and IRLANB uses the unwanted Ritz 
or harmonic Ritz values as shifts, respectively. These shifts are called exact shifts 
and harmonic shifts. Baglama and Reichel [3', '5^ propose a thick restarting technique 
that explicitly augments small subspaces with certain Ritz or harmonic Ritz vectors, 
leading to augmented restarted Lanczos bidiagonalization algorithms (IRLB A). Her- 
nandez et al. jlOj analyze a parallel implementation of this algorithm. Based on 
Stewart's work for large eigenproblems [3T], Stoll [35] presents a Krylov-Schur type 
algorithm that is restarted explicitly and is easily implemented. 

It is shown in [5D] that the Lanczos bidiagonalization method may fail to com- 
pute singular vectors though it converges for computing singular values for sufficiently 
good subspaces. To correct this deficiency, applying the refined projection principle 
proposed by the first author [IS] (see also [U [32J [31]), we have proposed a refined 
Lanczos bidiagonalization method, analyzed its convergence and developed an implic- 
itly restarted refined Lanczos bidiagonalization algorithm (IRRLB) [50] • Based on the 
refined approximations to singular vectors, we have proposed refined shifts that are 
theoretically better than the exact shifts used within IRLB. Numerical experiments 
have demonstrated that IRRLB often outperforms IRLB [20l [24] considerably and is 
more efficient than several other available schemes: PRO PACK [24], LANSO [23] [24]. 
the MATLAB internal function svds and some others when computing the largest and 
smallest singular triplets. 

Hochstenbach [TT] [T^] shows that for nested subspaces Ritz values approach the 
largest singular values monotonically but approach the smallest ones irregularly. So 
the Lanczos bidiagonalization method is more suitable for computing the largest sin- 
gular triplets and may exhibit irregular convergence behavior when computing the 
smallest singular triplets. In contrast, the smallest harmonic Ritz values converge to 
the smallest singular values monotonically from above and may be better approxi- 
mations. We continue to study how to compute the smallest singular triplets more 
efficiently in this paper. Based on the Lanczos bidiagonalization process, we propose 
a harmonic Lanczos bidiagonalization method by combining it with the harmonic 
projection principle. Our derivation is different from that in [31[12]- The method is 
the same as that in [3] but different from the one in [55]. We prove that for good 
enough projection subspaces harmonic Ritz values converge if the columns of A are 
strongly linearly independent. On the other hand, harmonic Ritz values may miss 
some desired singular values when the columns of A are almost linearly dependent. 
So harmonic Ritz values may not be reliable. Furthermore, harmonic Ritz vectors 
may converge irregularly and even may fail to converge. These results imply that 
either implicitly or explicitly restarted algorithms may converge very slowly, con- 
verge irregularly or fail to converge. To circumvent these drawbacks, combining the 
harmonic projection principle with the harmonic projection principle, we propose a 
refined harmonic Lanczos bidiagonalization method that takes the Rayleigh quotients 
of harmonic Ritz vectors as more accurate and reliable approximate singular values 
and extracts the best approximations to the desired singular vectors from the given 
subspaces that minimize the residuals formed with the Rayleigh quotients. We prove 
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that refined harmonic Ritz approximations converge once the Krylov subspaces are 
good enough and the Rayleigh quotients converge. We then develop an imphcitly 
restarted refined harmonic Lanczos bidiagonafization algorithm (IRRHLB). Based on 
the refined harmonic Ritz approximations to the desired singular vectors, in the spirit 
of Jia's work [151 [l2]j we propose a new shifts scheme, called the refined harmonic 
shifts, that we show to be theoretically better than the harmonic shifts used within 
the implicitly restarted harmonic Lanczos bidiagonafization algorithm (IRHLB) and 
IRLANB. Motivated by [121 [T7], we propose an efficient procedure to compute the 
refined harmonic shifts accurately. 

It is worth noting that Kokiopoulou et al. |22j also use the refined projection prin- 
ciple to compute the refined harmonic Ritz approximations. They exploit the lower 
Lanczos bidiagonafization process, use the Ritz or harmonic Ritz values as shifts in 
the algorithm and compute the smallest singular triplets one by one by exploiting de- 
flation. They only use the refined projection principle as refinement postprocessing at 
the end of each restart. The authors demonstrate that computing refined (harmonic) 
Ritz vectors and thus refined Ritz values benefits the overall convergence process. In 
particular, they show that while convergence is not apparent in terms of harmonic 
residual norms, monitoring refined residuals predicts convergence more accurately 
and safely. In contrast, based on the upper Lanczos bidiagonafization process and 
the refined projection principle, we propose a truly new method-the refined harmonic 
Lanczos bidiagonafization method that computes refined harmonic Ritz vectors as new 
approximations. We then develop IRRHLB with use of the new better shifts, called 
refined harmonic shifts, based on refined harmonic Ritz approximations. IRRHLB 
computes all the desired smallest singular triplets simultaneously. 

The paper is organized as follows. In §2, based on the Lanczos bidiagonafization 
process, we derive the harmonic Lanczos bidiagonafization method and then present 
some basic and important properties of approximate singular vectors to be used later. 
Exploiting Jia's results in [19j . we then make a convergence analysis. In §3 we propose 
the refined harmonic Lanczos bidiagonafization method. We prove that the refined 
harmonic Ritz approximations converge for good enough subspaces once the Rayleigh 
quotients converge. In §4, we consider selection of the shifts involved. For IRHLB, 
similar to what is done in [21 [22], we use the harmonic Ritz values. For IRRHLB, 
by exploiting the available refined harmonic Ritz approximations, we propose the 
refined harmonic shifts that are proved to be theoretically better than the harmonic 
shifts. We then present an efficient procedure to compute them. We show that in 
finite precision the refined harmonic shifts can be computed accurately. Meanwhile, 
we extend the adaptive shifting strategy proposed by Larsen |[24 and modified by 
Jia and Niu '2D] to IRHLB and IRRHLB. In §5, we report numerical results and 
compare IRRHLB with the five other state of art algorithms: IRHLB, IRRLB, IRLB, 
IRLANB and IRLBA, indicating that IRRHLB is at least competitive with the five 
other algorithms and can be considerably more efficient when computing the smallest 
singular triplets. Finally, we conclude the paper with some remarks in §6. 

We introduce some notations to be used. Denote by |j • |j the spectral norm of a ma- 
trix and the vector 2-norm, by k{A) = by /C„i(C, vi) — span{vi, Cvi, . . . , C^^^^vi} 
the m-dimensional Krylov subspace generated by the matrix C and the starting vi , by 
the superscript 'T' the transpose of a matrix or vector, by / the identity matrix with 
the order clear from the context and by the m-th coordinate vector of dimension 
m. 
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2. The harmonic Lanczos bidiagonalization method and convergence. 

Golub et al. [5] propose a Lanczos bidiagonalization method that can compute either 
the largest or the smallest singular triplets of A. The method is equivalent to the 
symmetric Lanczos method for the eigenproblem of A starting with a special vector 
[H [28] and is based on the Lanczos bidiagonalization process [3 [9j [27] , which satisfies 
the following relations if it does not break down before step to: 



(2.1) AQm — PmBm, 

(2.2) A^P„, = Qr^Bl + f3,nq,n+ie 



m ' 



where the m x m matrix 



(2.3) Bra = 



a2 



/3m- 1 

and the columns of Qm = [qi^ 12, ■ ■ ■ , Qm) and Pm = {pi,P2, ■ ■ ■ ,Pm) form orthonormal 
bases of the Krylov subspaces ICm{A^A,qi) and ICmiAA^,pi), respectively. So we 
have 

(2.4) P^AQm = B,n. 

The Lanczos bidiagonalization method computes the singular triplets {ai, Si,Wi), 
i = 1, 2, . . . , TO of Bm and then uses some of (ct^, PmSi, QmWi), called the Ritz approx- 
imations, to approximate the largest and/or smallest singular triplets of A. 

In finite precision, the columns of Pm and of Qm may rapidly lose orthogonality. A 
partial reorthogonalization strategy [231 124] is an effective technique for maintaining 
numerical orthogonality. However, Simon and Zha |29| show that it generally suf- 
fices to partially reorthogonalize only Pm or Qm rather than reorthogonalizing them 
simultaneously. This may reduce the computational cost considerably when only re- 
orthogonalizing the columns of Qm for M ^ N. In our codes, we adopt the strategy 
from [3J which is based on ^221- 

Given the subspace 



E = span 



Pm 

Q„. 



the harmonic projection method of A computes {Oi^(pi) satisfying the requirements 
\ [A- 9,1)^, ± AE 



(2.5) 



and uses them as approximations to some eigenpairs of A |TJ [3H [51] . 

Making use of l|2.4p , we see that (|2.5p is equivalent to the generalized eigenproblem 



(2.6) 



Bm \ s, \ _ I BmBi + Plemel, \ f s. 



Bl j \ wj 9A BlBm I \w. 



Bm is nonsingular as A has full column rank and its singular values interlace those 
of A [21 p. 449]. This is a symmetric positive definite generalized eigenproblem, so 
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its eigenvalues are all real and nonzero [51 [32] ■ Furthermore, we present the following 
result. 

Theorem 2.1. If (9i,Si,Wi) satisfies (|2.6p . then {—Oi,Si^—Wi) or equivalently 
{—9i,—Si,Wi) satisfies (|2.6p too, that is, the eigenvalues of p.6p come in plus-and- 
minus pairs and the eigenvectors have a special structure. 

Proof. Equation (|2.6p gives rise to 

BmWi = — (^i?m-B„j + p^emeyy^) Si, 

So we can readily see that the assertion holds. □ 

Assume that the nonnegative eigenvalues of p.6p are ordered as 

Q<9i<e2<--- <9k+i, 

where k + I = m. Then we use 

(2.7) {9^,Ui = PmSi/\\s^\\ = PmSi,Vi = QrnWi/\\m\\ = QmWi),i =l,2,...,k 

as approximations to the k smallest singular triplets (ci, Ui,Vi). This method is called 
the harmonic Lanczos bidiagonalization method. 9iS are called the harmonic Ritz 
values, iiiS and ViS the (left and right) harmonic Ritz vectors, and (^i, Ui, 'Di)'s the 
harmonic Ritz approximations. It is proved in \12\ that 

(2.8) (T^<9„ i^l,2,...,m. 

We have the following basic and important properties, which will play a key role 
in §4.2. 

Theorem 2.2. For i^ j it holds that 

(2.9) sjB^Wj = 0, wjBlsj = 
and 

(2.10) ujAij = 0, iJA^Uj = 0. 

Proof. Since (|2.6p is a symmetric positive definite generalized eigenproblem, for 
i j we have 

from which it follows that (j2.9p holds. 

p.lOP follows from Ui = PmSi/\\si\\, Vi = QmWi/\\wi\\ and ()2.4p . □ 

We now discuss efficient computation of harmonic Ritz approximations. It is 

disappointing that (|2.6p is a 2m x 2m generalized eigenproblem. Fortunately, we 

can reduce (j2.6p to a half sized SVD problem that can be solved more cheaply and 

accurately, as shown below. 
We get from ([23|) 

(2.11) {B^Bl + /?^e„e^)s, ^ 9,B,^w„ 

(2.12) BraW, = 9,s„ 
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from which it foUows that 

(2.13) {B^Bl + Ple^eDs, = Qls,. 

So, the {Qi, Si)'s are the singular values and right singular vectors of the (m + 1) x m 
matrix 

/ oT 

(2.14) . "t 

\ Pm&m 

and the left singular vectors 

(2.15) = B.B^s,. 

Therefore, we can get (0^, Si) more accurately and efhciently by computing the SVD 
of the half sized (|2.14p and obtain all the w^'s by solving the bidiagonal linear systems 
BmWi = 9iSi at a total cost of 0{m?) flops. 

We comment that, based on the harmonic projection of A onto K.m{j^ A, gi), 
Baglama and Reichel [31 H] also derive (|2.13p - (|2.15p . Our method is the same as 
that in 3J but is different from the one in [52], which is based on the lower Lanczos 
bidiagonalization process. 

From (|TT|) and (|^ . we have 

\\Avi - OiUiW = \\AQ„iWi - OiPraSiW = \\P„iBrnWi - OiPmSi\\ = \\BmWi - 9^Si\\ 

and 



Therefore, if 



Aw, - 9,u,\\^ + \\AT^u, - 9,i,,\\^ = ^J\\B^w, - 9,S^\\^ + ||BT.5, - 9,w,\\^ + pl\e'^S^\^ 

(2.16) < tol, 

where tol is a user prescribed accuracy, then the method is accepted as converged for 
tol. 

We now analyze the convergence. Jia [TS] establishes a general convergence theory 
of harmonic projection methods for large eigenproblems. The theory can be adapted 
here. 

From p.6|) . set the matrices 



B 



B„ 
Bl 



and 

C 



T 



BmBm 

Recall from Theorem 12 . II that zt9i's are the eigenvalues of B^^C. The following result 
is direct from Theorem 2.1 and Corollary 2.2 of jl9j. 

Theorem 2.3. Assume that (cr, u,w) is a singular triplet of A and define e = 
sin Z ((") to be the distance between the vector (") and the subspace E. Then 
there exists a perturbation matrix F satisfying 

(2.17) \\F\\<^J=\\B-ma\\A\\ + \\Ar), 
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such that the exact singular value a of A is an eigenvalue of B + F. Furthermore, 
there exists an eigenvalue 6 of B^^C satisfying 

(2.18) \e-a\<{2\\A\\ + \\F\\)\\F\\. 



This theorem shows that if e tends to zero and is uniformly bounded then 

there always exists one harmonic Ritz value 6 that converges to the desired singular 
value a. The interlacing theorem of singular values [9^, p. 449] tells us that 

— <\\B-'\\ = \\B-,'\\<- = \\A+\\ 

is uniformly bounded. As a result, if e = sin Z (( ") , i?) ^ 0, we should have 6 ^ a. 

However, the situation is by no means so simple, and it is instructive to see what 
will happen when only speaking of e small. Note that 

\\B-^\\A\\<\\A+\\\\A\\=k{A). 

So if k{A) = 0( j), that is, the columns of A are almost linearly dependent, then ||F|| 
may not be near zero, so that 16* — cr| may not be small. Actually, ~* ^ = ll^"*"!! 

once the smallest singular value (Ritz value) of B^ converges to CTi. In this case, the 
harmonic Lanczos bidiagonalization method may miss a. So the method may not be 
reliable, and 9 is only guaranteed to be a good approximation to a only if e is very 
small and the columns of A are strongly linearly independent. 

To improve convergence and reliability of the method, we recommend the Rayleigh 
quotient pi = ufAvi — sj BmWi as a new approximation to ai, as was also done in (12j . 
Pi is more accurate and reliable than Oi. Correspondingly, 9i in ()2.16p is replaced by 
Pi. We refer to |19j for more theoretical results and arguments on such a replacement. 

The following result is a direct application of Theorem 3.2 of [19] . 

Theorem 2.4. Let {9,z = (^,)) be an eigenpair of B~^C, and assume that Z±^ 
is such that the square matrix (z, Z^) is orthogonal and transforms B~^C into 

(2.19) (j^)i^-C(.,Z,)=(^ -^J 
Then if 

(2.20) sep(cr, G) > 0, 



it holds that 
(2.21) sinZ((n,(:))<|l 



2\\B-^\\A\\ 



(2.22) < 1 



\/l — e^sep{a, G) I 

211^-^1111^11 
^/^^72(sep(6l, G)-\a- 




Theorems 12.41 states that if the separation sep{9, G) of 9 and the other harmonic 
Ritz values is uniformly bounded below by a positive constant then the harmonic 
Ritz approximations w, v converge. Unfortunately, however, for a general A, sep(0, G) 
can be arbitrarily near zero, i.e., 9 can be arbitrarily close to the other harmonic 
Ritz values. As a result, the upper bounds (|2.2ip and ()2.22p can converge to zero 
very slowly and irregularly and even fail to do so as e ^ 0. This means that the 
approximate singular vectors may converge very slowly and irregularly and even may 
fail to converge. 
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3. The refined harmonic Lanczos bidiagonalization method. The previ- 
ous analysis shows that the harmonic Ritz approximations may converge slowly and 
irregularly and even may fail to converge. To overcome this intrinsic drawback, we 
now combine the harmonic Lanczos bidiagonalization method with the refined pro- 
jection principle and derive a refined harmonic Lanczos bidiagonalization method. 
Recall that pi = sjBmWi. We use {pi,tpi) satisfying 

(3-1) < [A-e^Il^r ± AE, 

I WAijjt - P^■4^^\\ = min.0gE,||^||=i - PjVII 

to replace (0^, (pi) as a new approximation to an eigenpair of A. The method first uses 
the harmonic Lanczos bidiagonalization method to compute {9i,(pi) and then forms 
the Rayleigh quotient pi — sf BmWi. With each pi, i ~ 1,2, k, it computes ipt by 
solving the minimization problem. 

The following results adapted from [T31 [TT] can be used to compute ipi efficiently 
and accurately. 

Theorem 3.1. Let Zi — [xj ,yj)'^ he the right singular vector of the matrix 




Bm \ 












/ 


) 








associated with its smallest singular value Urain- Then 



(3.3) WAlpi - p^^iW = O-min- 

With -tpi at hand, we define the new left and right approximate singular vectors 

as 

(3.4) iti = PmXi/\\Xi\\ = PrnXi, Vi = QmUi/WViW = QniVi 

and use {pi,Ui,Viys to approximate the k smallest singular triplets of A. We call 
{pi, Ui,Vi) a refined harmonic Ritz triplet and Ui,Vi a refined harmonic Ritz approxi- 
mation. 

Similar to (j2.16p . {pi,Ui,Vi) is accepted as converged if 

\l\\Avi - p^U^\\'^ + \\A^Vi - PiViW^ ^ ^ WBmVi - PiXi\\^ + \\B^Xt -~ PiViW^ +/3^|emiip 

(3.5) < tel. 

The following result is taken directly from Theorem 4.1 of [2T| . 
Theorem 3.2. Let {a,u,v) be a singular triplet of A. Then there exist U± and 
V±_ such that {u,Uj_) and {v,Vj_) are orthogonal and 

(3.6) (S)^(^'^-)-(o I 
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where L = UlAV±. Set 

L 



L 
LT 



and assume that {p,u,v) is approximating {a,u,v). Then if 
(3.7) sep(p, L) > sep(cr, L) — \p — a\ > 0, 

we have 

\\A-pI\\e+\p-a\ 



< 



\/l — e^(sep(o-, L) — |p — <t\ 



Note that sep(cr, L) is the gap of a and the other singular values of A and is a 
fixed constant. Theorems 13.21 shows that the refined harmonic Ritz approximations 
converge once e — + and p ^ a. Therefore, the refined harmonic Lanczos bidiag- 
onalization method overcomes, to great extent, the possible non-convergence of the 
harmonic Ritz approximations. 

4. The implicit restarting technique, shifts selection and an adaptive 
shifting strategy. 

4.1. The implicit restarting technique. Due to the storage requirements and 
computational cost, in practice, the number of steps cannot be large and must be lim- 
ited. For a relatively small m, however, the m-dimensional subspaces JCm{A"^A,qi) 
and ICm{AA"^,pi), in general, do not contain enough information on the desired right 
and left singular vectors, so that both the harmonic and refined harmonic Lanczos 
bidiagonalization methods do not converge. Therefore, it is necessary to restart the 
methods. The idea is to repeatedly update new starting vectors based on the informa- 
tion available and construct increasingly better Krylov subspaces until the methods 
converge. Implicit restarting is usually preferable not only because of efficiency of the 
restart procedure, but also because the implicit procedure is more effective at locking 
in desired directions and purging unwanted ones. 

We briefiy review the implicit restarting technique for the Lanczos bidiagonaliza- 
tion process [51 [SJ. Note that m — k + I. After running / implicit QR iteration steps 
on Bm using the shifts pj, j — 1,2, . . . ,1, we get 



(4.1) 



{BlB„, - pll) ■ ■ ■ {BlB„, - pfl) = PR, 
P^^BmQ upper bidiagonal. 



where P and Q are the accumulations of Givens rotations applied to Bm from the left 
and right, respectively. Define Q+ — QmQ, Pm — PmP and = P^B,nQ- This 
process is achieved implicitly from B^^^Bm to (_B+)'^_B+ by working on Bm directly. 
Performing the above / implicit QR iteration steps gives the following relations 

m- 

(4.2) AQ+=P+B+, 

(4.3) A^P+ = Q+Bf + (MmMra+l + Pt4+l)^l, 

where Pm,k is the entry of P in position (m, k) and the updated starting vector has 
the form 

I 

(4.4) ^qt ^X{{A'A-p]l)q, 
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with 7 a factor making \\qi\\ — 1. Since f3kPm,kQm+i + PtO-k+i orthogonal to Q^, 
we have obtained a fc-step Lanczos bidiagonahzation process starting with . It 
is then extended to a m-step Lanczos bidiagonahzation process in a standard way. 
So we avoid restarting the process from scratch and do it from step fc + 1 upwards. 
This saves the computational cost of the first k steps of the process. Applying the 
implicit restarting technique to the harmonic Lanczos bidiagonahzation method and 
its refined version in such a way, we have formally sketched an implicitly restarted 
harmonic Lanczos bidiagonahzation algorithm (IRHLB) and an implicitly restarted 
refined harmonic Lanczos bidiagonahzation algorithm (IRRHLB). 

4.2. Shifts selection. We can run IRHLB and IRRHLB once the shifts /ii,/X2, 
. . . , /i/ are given. However, in order to make them work as efficiently as possible, we 
should select the best possible shifts in some sense for each algorithm. In the same 
spirit of [15l [ITj , it has been shown in f20] that if the shifts are more accurate approx- 
imations to some of the unwanted singular values of A then the resulting subspaces 
contain more information on the desired singular vectors. The better the subspaces 
are, the faster IRHLB and IRRHLB may converge. For eigenproblems and SVD prob- 
lems, Morgan [Ul US] and Kokiopoulou et al. [22^ suggest using unwanted harmonic 
Ritz values as shifts, called the harmonic shifts here. These shifts are natural choices 
as they are the best approximations available to some of the unwanted eigenvalues 
and the unwanted singular values, respectively. So, for our IRHLB we also use the 
I unwanted harmonic Ritz values Ok+i, Ok+2, • ■ ■ , as shifts. Since the refined har- 
monic approximations Ui,Vi are optimal in the sense of residual minimizations, they 
are generally more accurate than the harmonic Ritz approximations Ui, Vi. Therefore, 
based on Ui, Vi, i — 1,2, . . . , k, it should be possible to find better possible shifts than 
the harmonic shifts. 

The following important result on the harmonic shifts is crucial for us to introduce 
and understand new better shifts for use within IRRHLB. 
Theorem 4.1. Define 

U = (ui, . . . ,Uk), V = {ill, . . ■,Vk), 

U± = (±Ufe+l, . . . ,±U„), V± = (±-5/0+1, • • • ,±Wm)- 

Then the harmonic shifts Ok+i, Ok+2i ■ • • j the absolute values of the I harmonic 

Ritz values of A with respect to the subspace span{{Uj^,V^)'^} . 

Proof. From definition (j2.5p of the harmonic projection as well as the relationship 
between (|2.5p and (|2.6p , it is easily verified that for i = fc + 1, . . . , to, if the j-th column 
of U± and that of V± have the same or opposite ± sign, then A has 9i or —9i as one 
harmonic Ritz value with respect to the subspace span{{Ij]^, V^)"""}. □ 

We see from ([XTU)) that 

f/T^Vl = 0, V'^A^UjL = 

and 

span{V} ® span{V±} = span{Qm}, span{U} © span{U±} = span{Pm} ■ 

Define U — (ui, . . . , iik) and V — {vi, . . . , Vk), and let U±, Vj_ be matrices with 
I — m — k columns satisfying 



(4.5) U'^AVi_ = 0, V^A^(ji_ = 
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and 

(4.6) span{V} © span{V±} = span{Qm}, span{U} span{U±} = span{Pm} , 

where © denotes the direct sum. Jia [18 derives a number of theoretical results that 
compare refined Ritz vectors and Ritz vectors. At this moment, we temporarily regard 
A as a general matrix, and ipi, ipi are a Ritz and the corresponding refined Ritz vector 
of A with respect to a general subspace E, respectively. One of Jia's results says that 
we always have 

\\[A~p,im\ < \\iA-pd)ip,\\ 

if the left-hand side is not zero and 

\\{A-pJ)tP,\\ « 

may occur if pi is close to some 9j for j =/= i. By standard perturbation theory in 
terms of residual norms, these two results demonstrate that i/ji is more accurate and 
can be much more accurate than cpi. Here we should point out that these claims 
hold without requiring that E is sufficiently good. Jia [18] constructs a number of 
symmetric matrices having well separated simple eigenvalues and accurate subspaces 
to illustrate this. More precisely, assuming that pi and (pi, ipi are used to approximate 
the eigenvalue and the eigenvector ipi of A and e is the distance between ipi and 
the subspace E, Jia's examples show that we can indeed have 

\p,-M^O[e), \\{A- pj)i^,\\=0[e), ||(i ~ = 0(1). 

The above results are easily adapted to the harmonic and refined harmonic Ritz 
vectors. Some similar symmetric matrices are constructed by Jia in jT9] for which 
the harmonic Ritz vectors have no accuracy at all but refined harmonic ones have 
accuracy 0(e). Coming back to our SVD context, the above results and analysis 
indicate that Ui and Vi are more accurate and can be much more accurate than Ui 
and Vi without the assumption that projection subspaces are sufficiently good. 

Based on the above, it is evident that the subspaces span{U±^ and span{V±} 
contain (possibly much) more accurate approximations to ±Ui and ±fi, i = k -\- 
l,fc + 2,...,iV than the subspaces span{U±} and span{V±} do. This, in turn, 
means that the subspace span{{tjJ_,V]^)'^} contains (possibly much) more accurate 
approximations to the eigenvectors -^{uj ,:kvj)'^ associated with the eigenvalues 

±f7i, « = fc + 1, /c + 2, . . . , A^, oi A than the subspace span{{Uj^, Kl )'^} does. Recall 
from Theorem 12.31 that a better subspace should generally produce more accurate 
harmonic Ritz values. Hence, combining with Theorem 14. 1[ we have come to the 
following key result. 

Theorem 4.2. As approximations to some o/ Ufc+i, . . . , (tjv, the absolute val- 
ues of the harmonic Ritz values ^i, i — 1, 2, . . . , Z of A with respeet to the subspace 
span{{U]^,V^)'^} are more accurate and can be much more accurate than the har- 
monic shifts 0k+l,0k+2, ■ ■ ■,0m- 

This theorem holds without assuming that span{Qm} and span{P„i} are suffi- 
ciently good. It suggests that we use better i — 1,2, . . . ,1 as shifts for use within 
IRRHLB. We call them the refined harmonic shifts. 

Computationally, at first glance, it seems quite complicated and expensive to get 
the refined harmonic shifts as it involves constructing Uj_,Vj_ that are related with 



A REFINED HARMONIC LANCZOS BIDIAGONALIZATION METHOD 



13 



the large A. Inspired by the tricks in [TH1|T7], however, we can exploit (|2.4p to propose 
an efScient procedure for computing them accurately, as shown below. 

Recall p.4p and define X = (ii, . . . , Xk) and Y — {yi, . . . , yk)- We use House- 
holder transformations to compute the full QR decompositions 

(4.7) BlX - ) , bJ^ = Qy(^ ^q' 
which costs 0{m?) flops. Partition 

(4.8) Qx = {Qxi,Qx2), Qy = (Qyi, Qy2), 

where Qxi and Qyi are the first k columns of Qx and Qy, respectively, and let 

(4.9) = PrnQY2, V± = QmQx2- 

Then it can be readily verified that 

U^AV^ = X^P7MnQx2 = X^B„,Qx2 = 0, 
V^A^U^_ = Y^QlA^P,nQY2 = Y^bIQy2 - 0. 

So U and V defined in this way meet conditions (14. 5p and (14. 6p and are just what we 
need. By (|4.6p . we have 

(4.10) span{U} = span{P,nQYi}, span{V} = span{QmQxi}- 

The harmonic Ritz values S^i, i — 1,2, . . . ,1 oi A with respect to (C/J, V^)^ satisfy 

Exploiting (|2.4p . we get a, I x I symmetric positive definite generalized eigenvalue 
problem 

T /-)T \ 1 Bjn \ I Qy2 



{Qy2tQx2) 



Bl 



{X2 



(4.11) l{Ql2.Ql2) ( V"'"'" i?jA„ ) (fe) 3- 

The ^i's are computed by the QZ algorithm [21131] using 0{l^) flops. So the total cost 
of computing the refined harmonic shifts is 0{ra^) flops, negligible compared with the 
harmonic Lanczos bidiagonalization method. 

We give more details on computation of the refined harmonic shifts. Denote by 
F„i and Gm the matrices of the left and right-hand sides in (|4.1ip . respectively, and 
observe that 

(4.12) Fm ~ QY2BmQx2 + Qx2BZiQy2, 

(4.13) Gm = {BlQY2f {B1Qy2) + l3l{elQY2f{elQY2) + {BmQx2f {BmQx2). 

Noting that the two matrices in are transposes each other, we only need to form 
QY2BmQx2 by computing {QY2Bm)Qx2 

is then used to form Gm- Since Gm is symmetric, we only need to compute its 
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upper triangular part. The total cost of forming F,„ and G„i is 0{m^) flops. We 
then compute the eigenvalues ^'s of the symmetric positive definite matrix pencil 

Now we show that in finite precision the above procedure is numerically sta- 
ble and can compute the refined harmonic shifts accurately. There are three major 
steps in the procedure: the QR decompositions in (|4.7p . computation of i^,„ and Gm 
and the solution of the eigenvalue problem of {F„i,Gm) by the QZ algorithm. Note 
that the QR decompositions can be computed using Householder transformations in 
a numerically stable way (we use the Matlab built-in code qr in our implementa- 
tion) and the QZ algorithm are numerically stable. Therefore, omitting details on 
roundoff errors, we finally compute the eigenvalues ^'s of a perturbed matrix pen- 
cil (Fm + 5Fm,Gm + SGm), whcrc SFm and 5Gm are the matrices of roundoff error 
accumulations and satisfy 

(4-14) 11 , II II - (^(einach) 

with eniach being the machine precision and || • ||i? the Frobenius norm. 

For an eigenpair {^-.g) of the pencil {Fm,Gm) with ||g|| — 1, let a — g^Fmg and 
/3 — g'^Gmg, so that (/3, a) is a projective representation of the eigenvalue |- [321 p. 
135]. Then it is known [32l p. 233] that there is an eigenvalue i of the matrix pencil 
{F„i + SFm, Gm + SGm) such that the chordal distance 

A I, ^ V\\SFm\\% + \\SGm\\l , 2 , 

It is important to point out that for not too small ^ the choral distance behaves 
like the ordinary distance | C ~ C h see a remark in ^32j p. 140]. So, how accurate ^ is 
depends on the |-'s condition number 

(4.16) 



v/a2 + /32 

If one of a and /3 is not small, ly is not large and thus by (|4.14p the relative error of ^ 
is O(emach) if |^| is not very small. 

We look at the smallest |^|. By Theorem 14.21 it is known that the absolute values 
l^l's better approximate some of (Tfe+i, . . . , ctat than O^+i, . . . , Om- Furthermore, recall 
from (|2.8p that dk+i > (Jk+i- So the smallest |^| is approximately bounded below by 

In the following, we establish lower bounds for |q;| and and an upper bound 
for u rigorously and prove when our proposed procedure can numerically compute the 
refined harmonic Ritz shifts accurately. 

Theorem 4.3. For any refined harmonic Ritz shift ^, we have 



(4.17) |a| > 2ak+i and \f3\ > 2cr\ 
and the ^ 's condition number is bounded from above: 

(4.18) ly < 



2 

fe-l-1 
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// ak+i is not very small, then numerically the procedure described can compute the 
refined harmonic shifts \ 's with relative accuracy ©(cmach)- 

Proof. We prove (|4.18p and (??) in turn. To prove (|4.18p . we first estimate |a| 
and then \(3\. From the definition of Fm and Gm, we have 

(4.19) \a\ = 2\g'^{Ql2BmQx2)9\ > {QY2BmQx2), 

where CTmin (C) denotes the smallest singular value of a matrix C. Note that PmQy and 
QmQx form orthogonal bases of the left subspace span{U} (B span{U±} = span{Pm} 
and the right subspace span{V} span{V±} = span{Qm}- Exploiting p.4|) . (|4.8p . 
(|4.9p and (|4.10p and keeping in mind that span{PmQYi} = U and span{QmQxi} = 
span{V}, we obtain from (|4.5[) that the projection matrix of A with respect to PmQy 
and QmQx is 

{PrnQyfAiQraQx) = ( S'S ) A{QmQ XI, QmQ X2) 
\ ^Y2^m J 



U 



T™ ] A{QmQxi,Vi_) 



Qvi^rnQxi 

C/JAVI 

QyiBmQxi 

QY2BmQx2 

whose singular values 9i, i — 1,2, ... ,m, labeled in increasing order, are the union of 
the singular values of QyiBrnQxi and Q^2BmQx2 and are just the Ritz values of A 
with respect to the left and right subspaces span{Pm} and span{Qm}- By the singular 
value interlacing property, we have ai < di, i = 1,2, ... ,m. Furthermore, note that 
QyiBmQxi and QY2BmQx2 are the projection matrices of A with respect to the left 
subspaces span{U} and span{U±} and the right subspaces span{V} and span{V±}, 
respectively. Therefore, the singular values of QyiBmQxi are 9i, i = l,2,...,k 
and approximate the k desired smallest singular values Ci's from above, while the 
singular values of QY2BmQx2 = UXAV± are 6i, i — k + 1, . . . ,m and approximate 
(Tfc+i, . . . , (Tm from above too. In particular, we have 

So it holds that 

\a\ > 2ak+i. 

Next we estimate j3. We obtain from ([iTT^ . and ([^ 

P = 9^{Ql2BmBj,QY2)g + PW{Ql2ememQY2)9 + g^{Qlc2BjnBmQx2)g 
= 9^{UlAA^U^)g + g^{VlA^AV^)g 
> <j,r,in{UlAA^U^) + anUVlA^AV^). 

Observe that U]^AA'^U± and V^A'^AV± are the projection matrices of AA"^ and 
A"^ A with respect to span{U±} and span{V±}, respectively. So their eigenvalues 
approximate some of the eigenvalues <jI_^_^, . . . , aj^ of AA"^ and A'^A. Furthermore, 
since 

V];A^AVj_ - {UjAV^fiUlAV^) - V];A^{I - U^Ul)AV^ 



16 



ZHONGXIAO JIA AND DATIAN NIU 



is symmetric nonnegative definite, the smallest eigenvalue of V^^A'^^Vj^ is no less than 
the smallest eigenvalue O^^-^ of {U]^AV±)'^ {U]^AV±). As a consequence, it follows from 

Ok+i > ffe+i that the smallest eigenvalue of y^^A^'^^Vj^ is bounded below by crf_^_i- 
Similarly, we have 



uIaA^U^_ - {UlAV^){UlAV^f = UlA{I - V^V];)A'^U^, 

10 

tt 

is bounded below by cr^.,.! too. Therefore, we get 



which is symmetric nonnegative definite. As a result, the smallest eigenvalue of 

-'k+l 



UlAA'^Uj_ is no less than the smallest eigenvalue 9l^^ of {JJ1AVj_){U1AVj_)'^ and 



/3> 2(7^+1 • 

So it follows from (|4.17p that the t's condition number 1/ in (|4.16p satisfies 



V < 



1 



2^fe+iVl + '^fe+i 

which is (liTTS)) . 

If (Tfc+i is not very small, then v is not large and all ^'s are not too small. Keep 
in mind the comments on (I4.14p and (j4.15p . It is then clear that numerically the pro- 
posed procedure can compute the refined harmonic shifts |^i|'s with relative accuracy 

O(einach). □ 

We should point out that Theorem l4.3l holds without any assumption on span{Qm} 
and span{Pm}, as is clearly seen from the proof. 

4.3. Adaptive shifting strategy. It has been observed [21j that if the fc-th 
desired Cfe is very near a shift then IRLB with the exact shifts (the unwanted Ritz 
values) converges very slowly and even stagnates. This is also the case for IRRHLB 
with the refined harmonic shifts and IRHLB with the harmonic shifts. The reason 
is that if some shift /i, is very near then the new starting vector will nearly 
annihilate the component of the desired Vk, so that the new subspace JCm{A'^ A,q^) 
contains very little information on Vk and pk converges to ak very slowly or not at all. 

In order to overcome this problem, for IRLB with the exact shifts, Larsen [24] 
proposes an adaptive shifting strategy for computing the largest singular triplets. He 
simply replaces a bad shift to be defined below by a zero shift. Jia and Niu |20j adapt 
it to IRRLB for computing the largest singular triplets but modify it for computing 
the smallest singular triplets. Their strategy works for IRHLB and IRRHLB: Define 
the relative gaps of pk and all the shifts fii^i — 1, 2, . . . , Z by 



(4.20) relgap,. 



(Pfe - £k) - Pi 



Pk 



where is the residual norm (|2.16p or (|3.5p . We should note that — is an 
approximation to ak- If relgap^.^ < 10~^, pi is a bad shift and should be replaced by 
a suitable quantity. 

Expand qi as a linear combination of the right singular vectors {vj}^ 



3f3 = l- 



N 

91 =H"J^J■ 
J=1 
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Then for the harmonic shifts /i^ = 9k+i, i = 1, 2, . . . , Z we have from (j4.4p 

m 

lit = n (^^^ - 

km N m 

i=fc+l i=k+l 

So if flfc+i is very near Cfc, which is the case that <Tk+i is very near crfc, then has 
a very smaU component in the direction of Vk- A good strategy is to replace 0k+i by 
the largest one among all the shifts, as this strategy amplifies the components of 
in Vi,i — 1,2, ... ,k and meanwhile dampens those in Vi,i = k + 1, . . . , N. 

The above strategy applies to the refined harmonic shifts as well. 

We now present IRHLB with the harmonic shifts and IRRHLB with the refined 
harmonic shifts, respectively. 

Algorithm 1. IRHLB with the harmonic shifts 

1. Given a unit length starting vector qi of dimension N, the steps m, the 
number k of the desired singular triplets and the convergence tolerance tol. 

2. Run the m-step Lanczos bidiagonalization process and construct Bm, Pm and 

m • 

3. Calculate the triplets (OijSij'Wi), i — 1, 2, . . . , m, by computing the singular 
values and right singular vectors of (|2.14p and by solving (|2.15p and normal- 
izing the solutions, and use the Rayleigh quotients pi = uj Avi = sj BmWi as 
approximations to Ci, 1 = 1, 2, . . . , fc. 

4. Replace 6i by pi in (|2.16p . For i = 1, 2, . . . , fc, test if p.l6p is satisfied. If yes, 
compute Ui and Vi explicitly and stop. 

5. Implicitly restart the Lanczos bidiagonalization process using the harmonic 
shifts and the adaptive shifting strategy. 

Algorithm 2. IRRHLB with the refined harmonic shifts 

1. Given a unit length starting vector qi of dimension N, the steps m, the 
number k of the desired singular triplets and the convergence tolerance tol. 

2. Run the m-step Lanczos bidiagonalization process and construct B„i, Pm and 

3. Calculate the triplets {9i, Si,Wi), i — 1,2, . . . ,to by computing the singular 
values and right singular vectors of (j2.14p and by solving ()2.15p and normal- 
izing the solutions, and use the Rayleigh quotients pi — uJ Avi = sJ BmWi as 
approximations to Ci, 1 = 1,2,..., fc. 

4. For each p^, i = 1, 2, . . . , fc, compute Xi and iji in Theorem 13. II 

5. For « = 1, 2, . . . , fc, test if (13. 5p is satisfied. If yes, compute Ui and Vi by (|3.4p 
explicitly and stop. 

6. Implicitly restart the Lanczos bidiagonalization process using the refined har- 
monic shifts and the adaptive shifting strategy. 

5. Numerical experiments. We have developed the experimental Matlab codes 
of IRRHLB, IRHLB, IRRLB and IRLB. The latter two were named IRRBL and IRBL 
in |20j and were originally developed based on the lower Lanczos bidigonalization pro- 
cess. Here we have developed their upper Lanczos bidiagonalization versions. These 
four codes call the upper Lanczos bidiagonalization process in Baglama and Reichel's 
code IRLBA, and some parameters and defaults are the same as those used in IRLBA. 
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We compare IRRHLB with IRRHLB, IRRLB, IRLB, IRLBA and IRLANB in this sec- 
tion and report numerical results. Numerical experiments were run on an Intel Core 
2 E6320 with CPU 1.86GHz and RAM 2GB under the Window XP operating system 
using Matlab 7.1 with emach = 2.22 x 10~^^. The stopping criteria are 



stopcrit = max y \\Avi — piUiW^ + WA^Ui — piVi\\^ (IRHLB) 

l<i<k V 

and 



stopcrit = max \/\\Avi — piUi\\'^ + \\A^Ui~piVi\\'^ (IRRHLB). 

l<i<k V 

If 

stopcrit 

(5.1) stopcrit — — jy-^^^ — < tol, 

then stop. Similar criteria apply to IRLB and IRRLB as well. In (j5.1|) . \\A\\ is replaced 
by the maximum of the current largest (harmonic) Ritz value and the old one obtained 
at last restart. Some parameters in IRRHLB, IRHLB, IRRLB and IRLB are described 
in Table O 

Table 5.1 

Parameters in IRRHLB, IRRLB, IRHLB and IRLB 



Parameters 


Description 


k 


Number of the desired singular triplets. 
Default value: k — 6. 


adjust 


Number added to k to speed up convergence. 
Default value: adjust = 3. 


disps 


When disps > 0, the k desired approximate singular values 
and norms of associated absolute residual error are displayed 
each iteration, disps = inhibits display of these quantities. 
Default value: disps = 1. 


M_B 


Maximum of Lanczos bidiagonalization steps. 
Default value: M.B = 20. 


maxit 


Maximum number of restarts. 
Default value: maxit — 300. 


sigma 


A 2-letter string which specifies which extreme singular triplets are to 
be computed, 'SS' for the smallest and 'LS' for the largest. 
Default value: sigma ='SS'. 


tol 


User defined relative tolerance to check convergence. 
Default value: tol = lO'^. 


vo 


min(M, Ai")-dimensional initial vector of Lanczos bidiagonalization. 
Default value: vq = randn{mm{M, N), 1). 



For large matrix eigenproblems, in order to speed up convergence, ARPACK 
(eigs) and Implicitly Restarted Refined Arnoldi Method (IRRA) [15] compute k + 3 
approximate eigenpairs, so the number of shifts is m — (/c + 3) when k eigenpairs are 
desired. This strategy adapts to Krylov type subspace algorithms for SVD problems, 
for instance, the default parameter adjust = 3 in IRLBA, which means that the 



A REFINED HARMONIC LANCZOS BIDIAGONALIZATION METHOD 



19 



updating subspaces are augmented with fc + 3 Ritz or harmonic Ritz vectors. With 
m — (fc + 3) exact shifts and harmonic shifts used, IRLB, IRHLB and IRLANB retain 
fc + 3 Ritz vectors and harmonic Ritz vectors in the updating subspaces, respectively. 

We mention that our codes IRRHLB, IRHLB, IRRLB and IRLB as weft as IRLB A 
and IRLANB (we used the newest available code irlanb_ review) do not involve any 
shift- and-invert matrix when computing the smallest singular triplets, while the Mat- 
lab internal function svds needs to factorize A of (|1.2p . In this context, we assume 
that A is too large to allow any factorization of A due to excess memory and/or 
computational cost, so we do not compare the above six algorithms with svds. 

Our experiments consist of three subsections. In the first two subsections, we 
test IRRHLB on a set of matrices having the clustered smallest singular values and 
on a set of ill-conditioned matrices, respectively. We show that IRRHLB works well 
on them and confirm some theory. In the third subsection, we compare IRRHLB 
with the five other algorithms on seven practical problems that include very difficult, 
difficult and general ones, illustrating that IRRHLB is at least competitive with and 
can be much more efficient than the five other ones. 

5.1. IRRHLB for the clustered smallest singular values. This set of ex- 
periments is designed to see how IRRHLB behaves for the clustered smallest singular 
values. Similar to those matrices in 22J, we constructed a sequence of diagonal ma- 
trices As G 7?,"^", n — 1000, s — 1, . . . , 4 whose nine smallest singular values become 
increasingly more clustered as s increases. In the Matlab language: 

(5.2) As = spdiags{[l : : 1 + 9* 10"", 2 : 1 : 1000]', 0, 1000, 1000], s = 1,... ,4, 

whose smallest singular value ai — 1 and k{As) — 991 for all s. Since k,{As) is mod- 
erate, it is expected that IRRHLB computes cti accurately if it works. We computed 
(Ti by taking the parameters 

opts.m — 50, opts.maxit — 2000, opts. adjust — 9, opts.tol = le — 8 

and using the same starting vector generated randomly in a normal distribution for 
all s. Figure [nH] plots absolute residual norms of the computed singular triplets and 
relative errors \p — 1|, respectively. We see that IRRHLB succeeded for all s and 
computed the smallest singular value accurately. In the worst case s = 4, IRRHLB 
gave the relative error |jO — 1| = 3.7 x 10^*, the same order as the backward error that 
equals the relative residual norm 10^*. For s = 3, the relative error |yO— 1| = 3.5 x 10~^. 
Both are in agreement with the standard perturbation theory pi I31j. For s = 1,2, 
the relative errors \p — 1| are 1.3 x 10^^^ and 8.1 x 10^^"*, respectively, a few order 
smaller than the predicted relative error 0(10"^). Also, IRRHLB used considerably 
fewer restarts for s = 1 than for the other bigger s but had comparable restarts 
for s — 2,3,4. It was expected that IRRHLB converged faster for s = 1 than for 
s — 2,3,4 since the gap of cri and (T2 is considerably bigger for s = 1 than those for 
s = 2,3,4. 

5.2. IRRHLB for ill-conditioned matrices. We investigate the behavior of 
IRRHLB for a set of ill-conditioned matrices. Similar to those matrices in [35], we 
constructed a sequence of bidiagonal matrices Ag G 7?."^", n = 1000, s ~ 4, . . . ,7 and 
9, . . . , 12 with increasing condition numbers: 



(5.3) As = spdiags{linspace{l, 10^ 1000)', 0, 1000, 1000), 
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Fig. 5.1. Experiments with the matrices i5. 2V . with the clustering smallest singular value. 
Solid lines correspond to residual norms. Dotted lines to relative errors of the smallest singular 
value computed by IRRHLB. 



whose smallest singular value ai = 1 and condition numbers k{As) = 10". We 
computed <ti by taking the parameters 

opts.m — 50, opts.maxit = 2000, opts. adjust — 3, opts.tol — le ~ lA 

using the same starting vector generated randomly in a normal distribution for all 
s. Figure [121 plots relative errors |p — 1|. It was seen from the figure that IRRHLB 
computed the smallest singular value with relative error smaller than 10^^ for s = 
4,5,6,7. This confirms the perturbation theory: the smaller s is, the smaller the 
relative error is. For more ill-conditioned cases s = 9, 10, 11, 12, the accuracy of the 
computed smallest singular values deteriorated significantly. For 5 = 9, 10, the relative 
errors are 9.0 x 10~^ and 4.0 x 10~^, and the computed singular values have four and 
five correct decimal digits, respectively. In the worst-conditioned case that s = 12, 
the relative error is 0.3848; for s = 11, the relative error is 0.1678, and the computed 
smallest singular value was a little bit more accurate than that for s = 12. 

We also tested opts.tol — le — 12 and compared the results with those for 
opts.tol — le — 14. We found that although residual norms continued decreasing 
until 10^^^, the accuracy of the computed singular values was not improved further 
as residual norms decreased from opts.tol = le — 12 to opts.tol — le — 14. This was 
reflected by the figures, where we saw that relative errors did not decrease further and 
stabilized, starting from some restart for each s except s ~ 9. The curves for s — 9 
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jumped up and down when the algorithm approached convergence, but kept below 
10^'*. All these are in accordance with the predicted relative errors, which should 
not be bigger than a very modest multiple of k{As) x opts.tol. Another important 
observation is that IRRHLB used more restarts as s increases. Since the ratio SJi^z£x 

(72— (Tl ' 

the spread over the gap of ai and (72, increases as s does, it is more difficult for 
IRRHLB to solve the SVD problem as s increases. We also saw that the curves of 
relative errors oscillated quite often in the middle of convergence processes. A careful 
observation revealed that IRRHLB started to be on its way to compute the desired 
singular value at some stage but lost it soon. Then it adaptively adjusted convergence 
repeatedly and eventually was on the correct way to converge. These phenomena may 
be explained by Theorem 12.31 and the comments followed. 




Fig. 5.2. Experiments with the increasingly ill-conditioned diagonal matrices iS.St .with k{As) = 
10^s = 4,5,6,7,9, 10, 11, 12. 



5.3. Experiments of the six algorithms on practical problems. We now 

do numerical experiments on several selected problems that include very difficult, 
difficult and general ones. We compare IRRHLB with the five other algorithms: 
IRRLB, IRHLB, IRLB, IRLBA and IRLANB. 

Table [5T^ lists seven test matrices from [HIT] and some of their basic properties. 
Except welll852, all other matrices are square matrices. Note that the ratio ^'^^^p'^j.^^ 
indicates whether or not the six algorithms are difficult to converge. The bigger 
^^gap(kf^ more slowly the algorithms converge generally. From the table, we 

see that the k{= 1,3,5,10) desired smallest singular values of all the matrices are 
quite clustered; among them the matrices fidap4, jagmaeshS and lshp3205 are the 
most difficult, the matrix platl919 is relatively difhcult, and the matrices welll850 
and dw2048 are general. We see that all k{A)'s are not very large, so the columns 
of A are strongly linearly independent. It is expected that if the algorithms converge 
then they can compute the smallest singular values with relative errors no more than 
a very modest multiple of k{A) x opts.tol. 

We computed the k smallest singular triplets for different k. To make a reason- 
able comparison, for each matrix except welll850 we used the same starting vector 
generated randomly in a normal distribution for the six algorithms. For welll850, we 
took the same starting vector uq — randn(1850, 1) in IRLANB and the same start- 
ing vector vq — randn(712,l) in the five other algorithms. In all tables, denote by 
iter the number of restarts, by time CPU time in second, by n.c non-convergence 
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Table 5.2 

Six test matrices: fidap4 of order 1601 X 1601, jagmeshS of order 1141 X 1141, lshp3205 of 
order 3205 x 3205, welll850 of order 1850 x 712, pde2961 of order 2961 X 2961 and platl919 of order 
1919x1919. sprcad(A) = (T]v — (Ti and gap(fc) = min((Ti+i — Ui), i = 1, 2, fc. 



Matrix 


fidap4 


jagmeshS 


lshp3025 


welll850 


dw2048 


pde2961 


platl919 


nnz(A) 


31837 


7465 


20833 


8755 


10114 


14580 


32399 


k{A) 


2.4e + 3 


2.2e + 4 


6.8e + 4 


l.le + 2 


2.1e + 3 


6.4e + 2 


3.7e + 2 


spread(yl) 


1.6e + 


6.8e + 


7.0e + 


1.8e + 


9.8e - 1 


l.Oe + 1 


2.3e + 


gap(l) 


1.5e-3 


1.7e - 3 


1.8e - 3 


3.0e - 3 


2.6e - 3 


8.2e - 3 


2.6e - 3 


gap(3) 


2.5e - 4 


1.6e - 3 


9.1e - 4 


3.0e - 3 


2.9e - 4 


2.4e - 3 


1.8e - 3 


gap(5) 


2.5e - 4 


4.8e - 5 


1.8e - 4 


3.0e - 3 


2.9e - 4 


2.4e - 3 


2.7e - 4 


gap(lO) 


2.5e - 4 


4.8e - 5 


2.2e - 5 


2.6e - 3 


1.6e - 4 


5.2e - 4 


2.0e - 4 



after 2000 restarts are used, and by mv the number of matrix-vector products. Since 
matrix- vector products involving A are equal to those involving , we only count 
the number of matrix-vector products involving A. We compare restarts and matrix- 
vector products as well as CPU time needed by all the codes for the same k and m. 
The former two quantities reflect the overall efficiency of the codes more fairly and 
reasonably. 

By the above description, in IRHLB, IRRLB, IRHLB, IRLB and IRLBA we took 
the input parameters 

opts.k = fc, cypts.M_B — m, opts.tol = tol, opts.maxit — 2000, opts.vO = Vq 

and the others as defaults. In IRLANB we took 

eignum — k, options. k — m — I, options! = m — 1 — (fc + 3), 
options.uO = Uq, options .maxit = 2000, options. version —' harmonic' 

and the others as defaults. This parameters make all the codes compute the approxi- 
mate singular triplets with respect to certain subspaces of the same dimension m and 
use the same number of shifts at each restart. 

We found that fidap4, jagmeshS and lshp3025 challenged most of the six al- 
gorithms. Tables I5.3H5.5I report the results obtained by IRRHLB and IRRLB for 
opts.tol = le — 6. 

Clearly, for fidap4 and lshp3205, IRRHLB worked well and solved the problem 
successfully while IRRLB only performed well in some cases and was less efficient than 
IRRHLB. In contrast, IRHLB, IRLB, IRLBA and IRLANB performed more poorly 
and they all failed to converge for fidap4 and lshp3205. For jagmeshS, IRRHLB 
still worked robustly and efficiently, but IRRLB succeeded only in a few cases and 
IRHLB, IRLB, IRLBA and IRLANB behaved more badly. They all were considerably 
less efficient than IRRHLB if they worked. We found that, generally, the bigger k 
was, the more restarts IRRHLB and IRRLB used for the same m. This should not be 
surprising as the problem for a bigger k is generally more difficult to solve than that 
for a smaller k. We also observed that all the smallest singular values were computed 
with relative errors no more than a modest multiple of k{A) x 10^^. 

We had more observations on the behavior of IRHLB, IRLB, IRLBA and IRLANB 
on these three difficult problems. For example, the residual norms obtained by them 
may oscillated but decreased very slowly; they may have first decreased to some 
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Table 5.3 
fidap4 for fc = 1, 3, 5, 10 



k = l 


m — 15 


TO = 20 


TO = 25 


Method 


iter 


time 


mv 


iter 


time 


mv 


iter 


time 


mv 


IRRHLB 


807 


46.5 


8881 


430 


4428 


6884 


375 


62.0 


7879 


IRRLB 


n.c 






1929 


207 


30868 


1208 


187 


25372 


fc = 3 


m = 15 


TO = 20 


TO = 25 


Method 


iter 


time 


TO?; 


iter 


time 


mv 


iter 


time 


mv 


IRRHLB 


1083 


55.4 


9753 


718 


60.5 


10058 


447 


64.1 


8499 


IRRLB 


n.c 






n.c 






1526 


213 


29000 


fc = 5 


m = 20 


TO = 25 


TO = 30 


Method 


iter 


time 


mv 


iter 


time 


mv 


iter 


time 


mv 


IRRHLB 


1797 


71.7 


12587 


1151 


114 


13820 


777 


100 


13217 


fc = 10 


m = 20 


TO = 25 


TO = 30 


Method 


iter 


time 


TOW 


iter 


time 


mv 


iter 


time 


mv 


IRRHLB 


1959 


139 


13726 


1084 


142 


13021 


737 


134 


12542 



Table 5.4 
jagmeshS for fc = 1, 3, 5, 10 



fc = 1 


TO 20 


TO = 25 


TO = 30 


Method 


iter 


time 


TOW 


iter 


time 


TOW 


iter 


time 


mv 


IRRHLB 


1167 


81.3 


18676 


953 


122 


20017 


828 


117 


21532 


fc = 3 


TO = 2C 


) 


TO = 2E 




TO = 3( 


) 


Method 


iter 


time 


TOW 


iter 


time 


mv 


iter 


time 


mv 


IRRHLB 


1563 


101 


21888 


1239 


136 


23547 


897 


126 


21534 


fc = 5 


TO = 2C 


) 


TO = 2E 




TO = 3( 


) 


Method 


iter 


time 


TOW 


iter 


time 


mv 


iter 


time 


mv 


IRRHLB 


1521 


89.6 


18260 


1108 


135 


11844 


761 


107 


16750 


fc = 10 


TO = 2f 




TO = 31 


) 


TO = 3f 




Method 


iter 


time 


mv 


iter 


time 


mv 


iter 


time 


mv 


IRRHLB 


1216 


125 


14605 


882 


148 


15007 


793 


169 


17459 


IRRLB 


n.c 






1763 


316 


29984 


810 


175 


17833 


IRHLB 


n.c 






n.c 






1707 


204 


37567 


IRLB 


n.c 






n.c 






1853 


220 


40779 


IRLBA 


n.c 






n.c 






1919 


37 


42230 



stage and then oscillated; they might have first decreased, then stabilized and did 
not decrease further; they might have decreased to some stage and then increased. 
Therefore, IRRHLB is not only the best but also the unique choice for fidap4, jagmesh8 
and lshp3025 for most of the given fc's and to's. 

We tested welll850, pde2961, dw2048 and platl919 for opts.tol = le - 6, le - 9, 
respectively. Tables [5^5.91 report the results for opts.tol = le — 6. We do not list 
the corresponding results for opts.tol = le — 9, as will be explained shortly. 

We found that all the algorithms computed the desired smallest singular values 
correctly once they converged. The computed smallest singular values had relative 
errors no more than a very modest multiple of k(A) x 10~^. We observed that, in 
terms of restarts and matrix-vector products, IRRHLB was often considerably more 
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Table 5.5 
lshp3025 /or fc = 1,3,5 



k = 1 


m = 30 


TO = 40 


TO = 50 


Method 


iter 


time 


mv 


iter 


time 




iter 


time 


mv 


IRRHLB 


1116 


293 


30320 


886 


405 


31900 


947 


918 


43566 


IRRLB 


n.c 






n.c 






1604 


1522 


73788 


k = 3 


m = 30 


TO = 40 


TO = 50 


Method 


iter 


time 


mv 


iter 


time 




iter 


time 


TOtl 


IRRHLB 


1520 


496 


36486 


1139 


761 


38732 


971 


978 


42730 


IRRLB 


n.c 






n.c 






1116 


11133 


49110 


k^5 


m = 40 


TO = 51 


) 


TO = 60 


Method 


iter 


time 


mv 


iter 


time 




iter 


time 


mv 


IRRHLB 


n.c 






1931 


1906 


81110 


1656 


2439 


86120 



efficient and several times faster than the others except IRRLB. IRRLB was nearly 
as efficient as IRRHLB in many cases, and it was slightly better than IRRHLB in a 
few cases; see, e.g.. Table 1^77115. 81 for the results on dw2048 for to = 50 and pde2961 
for A: = 1, 3, TO = 50. However, for the relatively difficult platl919, it was less robust 
than IRRHLB and failed to converge for some k and to; see Table 15.91 IRHLB, 
IRLB, IRLBA and IRLANB were less robust and efficient than IRRLB, as the tables 
indicate. In addition, the results demonstrate that the bigger k was, the more restarts 
the algorithms used generally. 

For opts.tol — le — 9, we observed similar phenomena and had similar findings. 
The only essential exception is that for plat 1919 and fc = 5, to = 15, IRRLB did not 
converge after 2000 restarts were used. Furthermore, we found that for the four test 
matrices all the algorithms used more restarts for opts.tol = le — 9 than those for 
opts.tol = le — 6 and they continued converging very smoothly from opts.tol = le — 6 
to opts.tol = le — 9, provided they converged. Hence we do not list the results 
anymore. 

To be more illustrative, we draw some typical curves that feature general conver- 
gence processes of the six algorithms. Figures [573^5.41 depict absolute residual norms 
versus restarts for welll850 when fc = 1, 3 and opts.tol = le — 6, le — 9, respectively. 
The figures clearly demonstrate that IRRHLB is the fastest, IRRLB is the second 
best, IRHLB is faster than IRLB while IRHLB, IRLB, IRLBA and IRLANB are com- 
parable and competitive though IRLANB may be slightly slower. The tables tell us 
that IRHLB was faster than IRLB. We see from the figures that after some stages 
the algorithms started converging quite smoothly and they used more but not too 
more restarts for the smaller opts.tol = le — 9. Besides, for IRLANB, we see that 
they computed the smallest singular triplet after many restarts then found the second 
and third smallest singular triplets very quickly. This is because after the previous 
singular triplet (s) was (were) computed the available subspaces had contained rich 
information on the later desired singular vectors. 

We have done more experiments and have similar findings. Based on them, we 
may conclude that IRRHLB is the best and the most robust for general purpose 
and IRRLB is the second best. As far as overall efficiency is concerned, in terms 
of restarts and matrix-vector products, IRRHLB is the fastest and IRRLB is the 
second best while IRHLB, IRLB, IRLBA and IRLANB are all comparable each other 
and no one is considerably superior to the others. A further observation tells us that 
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Table 5.6 
welll850 for fc = 1, 3, 5, 10, tol = le - 6 



k = 1 


TO 


15 


20 


25 


Algorithms 


iter 


time 


mv 


iter 


time 


mv 


iter 


time 


mv 


IRRHLB 


71 


2.67 


785 


43 


2.98 


692 


35 


4.18 


739 


IRRLB 


69 


2.51 


763 


62 


4.48 


996 


35 


4.16 


739 


IRIILB 


168 


•1.32 


1852 


83 


4.99 


1332 


51 


5.55 


1075 


IRLB 


183 


5.99 


2017 


91 


5.63 


1460 


55 


6.01 


1159 


IRLBA 


191 


1.90 


2105 


93 


1.21 


1492 


57 


1.00 


1201 


IRLANB 


279 


7.88 


2795 


133 


6.61 


2000 


82 


6.40 


1645 


fc = 3 


TO 


15 


20 


25 


Algorithms 


iter 


time 


mv 


iter 


time 


mv 


iter 


time 


mv 


IRRHLB 


80 


3.10 


726 


51 


3.53 


720 


35 


4.08 


671 


IRRLB 


103 


4.15 


933 


63 


4.43 


888 


41 


5.16 


785 


IRHLB 


171 


5.85 


1545 


76 


4.51 


1070 


46 


4.47 


880 


IRLB 


184 


4.96 


1662 


82 


4.84 


1154 


49 


4.80 


937 


IRLBA 


189 


1.70 


1707 


83 


1.01 


1168 


50 


0.86 


956 


IRLANB 


259 


6.11 


2079 


109 


4.79 


1424 


63 


4.10 


1141 


fc = 5 


TO 


15 


20 


25 


Algorithms 


iter 


time 


mv 


iter 


time 


m,v 


iter 


time 


TO,?; 


IRRHLB 


Wr) 


3.18 


743 


60 


4.00 


728 


43 


4.99 


739 


IRRLB 


161 


5.53 


1135 


70 


4.51 


848 


50 


4.64 


688 


IRHLB 


248 


6.43 


1744 


94 


5.14 


1136 


53 


4.06 


909 


IRLB 


275 


6.08 


1933 


103 


4.70 


1244 


58 


4.69 


994 


IRLBA 


292 


2.23 


2052 


108 


1.26 


1304 


60 


0.97 


1028 


IRLANB 


388 


7.30 


2337 


128 


4.58 


1417 


69 


3.41 


1113 


fc= 10 


TO 


20 


25 


30 


Algorithms 


iter 


time 


mv 


iter 


time 


mv 


iter 


time 


mv 


IRRHLB 


114 


6.81 


811 


63 


6.56 


769 


40 


6.54 


693 


IRRLB 


171 


9.47 


1210 


69 


7.18 


841 


42 


6.76 


727 


IRHLB 


194 


5.81 


1371 


77 


4.34 


937 


45 


3.94 


778 


IRLB 


202 


5.50 


1427 


82 


4.77 


997 


47 


4.05 


812 


IRLBA 


170 


1.61 


1196 


72 


0.98 


871 


43 


0.81 


739 


IRLANB 


282 


5.73 


1706 


99 


3.89 


1103 


56 


3.43 


910 



IRHLB is faster than IRLB. That IRRHLB is superior to IRRLB and IRHLB is better 
than IRLB sheds light on the fact argued in the introduction: The refined harmonic 
projection and the harmonic projection are more suitable for computing the smallest 
singular triplets than the refined standard projection and the standard projection, 
respectively. Meanwhile, we find that, as far as CPU timings are concerned, IRRHLB 
can be inferior to IRLBA. This may be partly because A is not very large or A too 
sparse, so that the savings of the first fc + 3 steps of the Lanczos bidiagonalization 
process cannot compensate implicit restarting with m — (fc + 3) shifts, and partly 
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Table 5.7 
dw2048 for k = 1,3, 5, 10, tol = le - 6 



k = 1 


m 


30 


40 


50 


Algorithms 


iter 


time 


mv 


iter 


time 


mv 


iter 


time 


mv 


IRRHLB 


93 


24.8 


2242 


77 


38.1 


2776 


64 


37.7 


2948 


IRRLB 


156 


36.9 


4060 


52 


23.3 


1876 


46 


30.0 


2120 


ffillLB 


23() 


.")9.(i 


6140 


128 


57.2 


4(il2 


82 


47.9 


377G 


IRLB 


266 


67.5 


6920 


145 


62.9 


5224 


93 


57.5 


4282 


IRLBA 


276 


9.71 


7180 


148 


7.54 


5332 


94 


6.64 


4328 


IRLANB 


406 


64.2 


10155 


219 


70.0 


7670 


142 


60.4 


6395 


k = 3 


m 


30 


40 


50 


Algorithms 


iter 


time 


mv 


iter 


time 


mv 


iter 


time 


mv 


IRRHLB 


100 


22.7 


2406 


81 


34.8 


2760 


67 


40.9 


2954 


IRRLB 


117 


26.2 


2814 


72 


31.5 


2454 


46 


32.8 


2030 


IRHLB 


209 


40.8 


5022 


110 


41.8 


3746 


71 


42.7 


3130 


IRLB 


230 


44.0 


5526 


120 


45.2 


4086 


78 


44.9 


3438 


IRLBA 


238 


7.84 


5718 


124 


6.21 


4222 


80 


5.55 


3526 


IRLANB 


280 


29.3 


6447 


142 


42.7 


4693 


92 


35.5 


3963 


fc = 5 


TO 


30 


40 


50 


Algorithms 


iter 


time 


mv 


iter 


time 


mv 


iter 


time 


m,v 


IISRIILB 


107 


22.0 


23G2 


80 


30. 9 


2568 


()9 


49.8 


2906 


IRRLB 


146 


32.7 


6220 


75 


35.2 


2408 


49 


33.6 


2066 


IRHLB 


186 


31.5 


4100 


95 


32.7 


3048 


63 


35.1 


2654 


IRLB 


266 


47.0 


5860 


134 


43.0 


4296 


85 


47.8 


3578 


IRLBA 


287 


9.27 


6321 


142 


6.79 


4552 


89 


6.06 


3746 


IRLANB 


224 


28.9 


4713 


114 


23.4 


3543 


73 


28.0 


3002 


fc = 10 


TO 


30 


40 


50 


Algorithms 


iter 


time 


mv 


iter 


time 


mv 


iter 


time 


mv 


IRRHLB 


190 


38.9 


3243 


110 


46.2 


2983 


89 


67.7 


3306 


IRRLB 


246 


51.4 


4195 


136 


65.2 


3685 


68 


54.1 


2529 


IRHLB 


443 


63.5 


7544 


187 


59.3 


5062 


110 


56.8 


4083 


IRLB 


483 


69.0 


8224 


203 


63.6 


5494 


117 


60.0 


4342 


IRLBA 


335 


9.99 


5636 


159 


7.21 


4250 


90 


5.85 


3322 


IRLANB 


546 


54.7 


8750 


222 


43.7 


5786 


126 


44.2 


4550 



becaiisc our code on implicit restarting is not far from optimized. In any event, as a 
whole, we can draw an overall conclusion that IRRHLB is at least competitive with 
and can be much more efficient than the five other state of the art algorithms in both 
robustness and efficiency. 

The advantages of IRRHLB arc twofold: It extracts the best left and right approx- 
imate singular vectors from the given subspaces in the sense of residual minimizations; 
it uses the better refined harmonic shifts to construct better subspaces at each restart. 
Each of these two advantages alone may not gain much, but, as restarts proceed, the 
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Table 5.8 
pde2961 for k = l,3, 5, 10, tol = le - 6 



k ^ 1 


TO 


30 


40 


50 


Algorithms 


iter 


time 


mv 


iter 


time 


mv 


iter 


time 


mv 


IRRHLB 


127 


44.1 


3306 


92 


41.3 


3316 


85 


78.9 


3914 


IRRLB 


230 


77.7 


5984 


147 


80.3 


5296 


66 


59.7 


3040 


IRHLB 


371 


116 


9650 


193 


106 


6952 


122 


105 


5616 


IRLB 


425 


136 


11054 


226 


133 


8140 


142 


122 


6536 


IRLBA 


463 


21.6 


12042 


238 


16.6 


8572 


148 


14.5 


6812 


IRLANB 


490 


100 


12255 


252 


103 


8825 


157 


99.0 


7070 


fc = 3 


TO 


30 


40 


50 


Algorithms 


iter 


time 


mv 


iter 


time 


mv 


iter 


time 


mv 


IRRHLB 


149 


44.7 


3582 


113 


62.2 


3848 


101 


83.1 


4450 


IRRLB 


254 


83.6 


6102 


149 


90.3 


5072 


85 


63.3 


3746 


IRHLB 


475 


137 


11406 


239 


131 


8132 


146 


113 


6430 


IRLB 


537 


143 


12894 


272 


134 


9254 


167 


139 


7354 


IRLBA 


581 


27.9 


13950 


284 


19.3 


9662 


172 


16.3 


7574 


IRLANB 


686 


133 


15785 


339 


111 


11194 


204 


97.3 


8779 


fc = 5 


TO 


30 


40 


50 


Algorithms 


iter 


time 


mv 


iter 


time 


mv 


iter 


time 


mv 


IRRHLB 


161 


52.0 


3550 


127 


78.6 


4082 


100 


97.6 


4208 


IRRLB 


239 


65.6 


5266 


110 


66.2 


3528 


103 


99.2 


4334 


IRHLB 


518 


122 


11404 


248 


114 


7944 


149 


111 


6266 


IRLB 


575 


113 


12658 


278 


126 


8904 


165 


116 


6938 


IRLBA 


579 


25.1 


12745 


273 


18.2 


8743 


164 


15.2 


6895 


IRLANB 


604 


99.7 


12693 


284 


84.6 


8813 


169 


71.5 


6938 


fc = 10 


TO 


30 


40 


50 


Algorithms 


iter 


time 


mv 


iter 


time 


mv 


iter 


time 


mv 


IRRHLB 


290 


83.7 


4943 


176 


80.7 


4765 


139 


145 


5156 


IRRLB 


571 


145 


9720 


189 


108 


5116 


144 


132 


5341 


IRHLB 


925 


212 


15738 


373 


163 


10084 


205 


150 


7598 


IRLB 


1004 


193 


17081 


403 


157 


10894 


223 


140 


8264 


IRLBA 


601 


23.3 


10157 


258 


15.9 


6948 


145 


12.8 


5355 


IRLANB 


1302 


138 


20846 


502 


119 


13066 


269 


106 


9698 



cumulative effect of their combination may be very striking, so that IRRHLB can be 
much more efficient than the other algorithms, as also noticed and commented on the 
refined algorithms for the large eigenproblem in [15l [17] . 

6. Concluding remarks. We have presented the refined harmonic Lanczos bidi- 
agonalization method for computing the smallest singular triplets of large matrices. 
We have developed a practical implicitly restarted algorithm with the refined har- 
monic shifts scheme suggested. We have done many numerical experiments and have 
compared the new algorithm with the five other state of the art algorithms. The 
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Table 5.9 
platl919 for fc = 1, 3, 5, 10, tol = le - 6 



k = 1 


m 


15 


20 


25 


Algorithms 


iter 


time 


mv 


iter 


time 


mv 


iter 


time 


mv 


IRRHLB 


146 


8.20 


1610 


129 


15.0 


2068 


74 


12.0 


1558 


IRRLB 


351 


22.7 


3865 


200 


23.0 


3204 


105 


19.0 


2209 


IRHLB 


605 


37.5 


6659 


286 


27.0 


4580 


163 


28.3 


3427 


IRLB 


671 


41.6 


7385 


313 


32.9 


5012 


183 


29.3 


3847 


IRLBA 


727 


12.7 


8001 


337 


8.08 


5396 


191 


6.15 


4015 


IRLANB 


943 


33.2 


9435 


425 


27.9 


6380 


234 


29.2 


4685 


k = i 


m 


15 


20 


25 


Algorithms 


iter 


time 


mv 


iter 


time 


mv 


iter 


time 


mv 


IRRHLB 


214 


10.9 


1932 


209 


17.7 


2890 


153 


21.4 


2913 


IRRLB 


605 


30.5 


5451 


179 


15.3 


2512 


123 


16.4 


2343 


IRHLB 


923 


37.2 


8313 


379 


31.9 


5312 


214 


28.4 


4072 


IRLB 


1089 


48.7 


9807 


451 


37.8 


6320 


253 


28.8 


4813 


IRLBA 


1277 


17.7 


11499 


511 


10.5 


7160 


273 


8.44 


5193 


IRLANB 


1428 


46.4 


11431 


545 


32.7 


7092 


286 


25.6 


5155 


/c = 5 


m 


15 


20 


25 


Algorithms 


iter 


time 


mv 


iter 


time 


mv 


iter 


time 


mv 


IRRHLB 


385 


15.3 


2703 


237 


19.7 


2582 


171 


22.2 


2915 


IRRLB 


1983 


86.0 


13889 


454 


40.8 


5456 


266 


35.8 


4530 


IRHLB 


n.c 






793 


48.2 


9524 


387 


43.7 


6587 


IRLB 


n.c 






869 


64.0 


10436 


432 


46.7 


7352 


IRLBA 


n.c 






722 


13.5 


8647 


352 


9.77 


5983 


IRLANB 


n.c 






1150 


54.5 


12659 


524 


45.9 


8393 


/c = 10 


m 


20 


25 


30 


Algorithms 


iter 


time 


mv 


iter 


time 


mv 


iter 


time 


mv 


IRRHLB 


685 


45.1 


4808 


368 


45.5 


4429 


395 


94.9 


6728 


IRRLB 


n.c. 






1283 


165 


15409 


490 


116 


8343 


IRHLB 


n.c. 






1809 


192 


21721 


891 


155 


15160 


IRLB 


n.c. 






n.c. 






1428 


209 


24289 


IRLBA 


n.c. 






1317 


26.8 


12412 


562 


17.2 


8330 


IRLANB 


n.c. 






n.c. 






1039 


92.7 


16638 



results show that the new algorithm is at least competitive with and can be much 
more efficient than the five other algorithms in both robustness and efficiency. 

We have reported the numerical results of computation of the smallest singular 
triplets. We have also done many numerical experiments on computation of the largest 
singular triplets. As indicated in [20], IRRLB is generally preferable to IRLB [201123]. 
PRO PACK [24], LANSO [23l[24] and svds as well as some others; it is the most robust 
among the restarted algorithms. Note that IRLANB is designed to only compute the 
smallest singular triplets. For computation of the largest singular triplets, we have 
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Fig. 5.3. Convergence curves of welll850 with fc = 1 and tol = le — 6, le — 9. 



found that IRRLB is at least competitive with the four other algorithms, in which 
IRLBA uses Ritz approximations. However, more observations reveal that IRRHLB 
and IRHLB are considerably inferior to IRRLB and IRLB, respectively. This suggests 
that IRRLB and IRLB are suitable for computing both the largest singular triplets 
and the smallest ones but IRRHLB and IRHLB are more suitable for computing the 
smallest singular triplets. 

The Matlab codes of IRRHLB, IRRLB, IRHLB and IRLB can be obtained from 
the authors upon request. 

Acknowledgements. We thank two referees very much for their very valuable 
and helpful suggestions and comments, which made us improve on the presentation 
considerably. Many thanks also go to Kokiopoulou, Bekas, Gallopoulos and Baglama 
and Reichel for generously providing us their IRLANB and IRLBA codes, which made 
our numerical experiments and comparisons possible. 



REFERENCES 

[1] Z. Bai, R. Barret, D. Day, J. Demmel and J. Dongarra, Test matrix col- 
lection for non-Hermitian eigenvalue problems, Technical Report CS-97-355, Uni- 
versity of Tennessee, Knoxville, 1997. LAPACK Note #123. Data available at 
'http:/ /math.nist . gov /MarketMatrix 

[2] Z. Bai, ,1. Demmel, J. Dongarra, A. Ruhe, H. A. van der Vorst, Templates for the Solution 
of Algebraic Eigenvalue Problems: A Practical Guide, SIAM, Philadelphia, PA, 2000. 



30 



ZHONGXIAO JIA AND DATIAN NIU 




Fig. 5.4. Convergence curves of welll850 with k = 3 and tol = le — 6, le — 9. 



[3] ,J. Baglama and L. Reichel, Augmented implicitly restarted Lanczos bidiagonalization meth- 
ods, SIAM J. Sci. Comput. 27 (2005), pp. 19-42. 

[4] J. Baglama and L. Reichel, Restarted block Lanczos bidiagonalization methods, Numer. Al- 
gor., 43 (2006), pp. 251-272. 

[5] A. BjORCK, Numerical Methods for Least Sguares Problems, SIAM, Philadelphia, PA, 1996. 

[6] A. BjorCK, E. Grimme and P. van Dooren, An implicitly bidiagonalization algorithm for 
ill-posed systems, BIT, 34 (1994), pp. 510-534. 

[7] I. S. Duff, R. G. Grimes and ,1. G. Lewis, User's guide for the Harwell-Boeing sparse matrix 
collection (Release 1), Technical Report, RAL-92-086, Rutherford Appleton Laboratory, 
UK, 1992. Data available at http://math.nist.gov/MarketMatrix 

[8] G. H. GOLUB, F. T. LuK and M. L. Overton, A block Lanczos method for computing the singu- 
lar values and singular vectors of a matrix, ACM Trans. Math. Soft., 7 (1981), pp. 149—169. 

[9] G. H. GOLUB and C. F. Van Loan, Matrix Computations, 3rd ed.. The Johns Hopkins Uni- 
versity Press, Baltimore, 1996. 
[10] V. Hernandez, J. E. Roman and A. Tomas, A robust and efficient parallel SVD solver based 

on restarted Lanczos bidiagonalization, submitted (2007). 
[11] M. E. Hochstenbach, A Jacobi-Davidson type SVD method, SIAM J. Sci. Comput., 23 (2001), 
pp. 606-628. 

[12] M. E. Hochstenbach, Harmonic and refined extraction methods for the singular value prob- 
lem, with applications in least squares problems, BIT, 44 (2004), pp. 721-754. 

[13] Z. ,IlA, Refined iterative algorithms based on Arnoldi's process for large unsymmetric eigen- 
problems. Linear Algebra Appl., 259 (1997), pp. 1—23. 

[14] , A refined iterative algorithm based on the block Arnoldi process for large unsymmetric 

eigenproblems, Linear Algebra Appl., 270 (1998), pp. 170—189. 

[15] , Polynomial characterizations of the approximate eigenvectors by the refined Arnoldi 

method and an implicitly restarted refined Arnoldi algorithm, Linear Algebra Appl., 287 
(1999), pp. 191-214. 



A REFINED HARMONIC LANCZOS BIDIAGONALIZATION METHOD 



31 



[16] , A refined subspace iteration algorithm for large sparse eigenproblems, Appl. Numer. 

Math., 32 (2000), pp. 35-52. 
[17] , The refined harmonic Arnoldi method and an implicitly restarted refined algorithm for 

computing interior eigenpairs of large matrices, Appl. Numer. Appl., 42 (2002), pp. 489- 

512. 

[18] , Some theoretical comparisons of refined Ritz vectors and Ritz vectors, Science in China, 

Series A, (47) Suppl. (2004), pp. 222-233. 

[19] , The convergence of harmonic Ritz values, harmonic Ritz vectors and refined harmonic 

Ritz vectors, Math. Comput., 74 (2005), pp. 1441-1456. 

[20] Z. ,IlA AND D. Niu, An implicitly restarted refined bidiagonalization Lanczos method for com- 
puting a partial singular value decomposition, SIAM J. Matrix Anal. Appl., 25 (2003), 
pp. 246-265. 

[21] Z. JiA AND G. W. Stewart, An analysis of the Ray leigh- Ritz method for approximating 
eigenspaces. Math. Comput., 70 (2001), pp. 637-647. 

[22] E. KOKIOPOULOU, C. Bekas AND E. Gallopoulos, Computing smallest singular triplets with 
implicitly restarted Lanczos bidiagonalization, Appl. Numer. Math., 49 (2004), pp. 39—61. 

[23] R. M. Larsen, Lanczos bidiagonalization with partial reorthogonalization, Chapter of Ph.D. 

thesis. Department of Computer Science, University of Aarhus, Danmark, 1998. Available 
online from http://soi.standford.edu/~rmunk 

[24] R. M. Larsen, Combining implicit restarts and partial reorthogonalization in Lanczos bidiag- 
onalization, http:/ /soi. stanford.edu/~rmunk/PROPACK 

[25] R. B. Morgan and M. Zeng, Implicitly restarted GMRES and Arnoldi methods for nonsym- 
metric linear systems of eguations, SIAM J. Matrix Anal. Appl., 21 (2000), pp. 1112—1135. 

[26] R. B. Morgan, A harmonic restarted Arnoldi algorithm for calculating eigenvalues and de- 
termining multiplicity, Linear Algebra Appl., 415 (2006), pp. 96—113. 

[27] C. C. Paige and M. A. Saunders, Algorithm 583 LSQR: Sparse linear equations and sparse 
least squares, ACM Trans. Math. Software, 8 (1982), pp. 195-209. 

[28] B. N. Parlett, The Symmetric Eigenvalue Problem, SIAM, Philadelphia, PA, 1998. 

[29] H. D. Simon and H. Zha, Low-rank matrix approximation using the Lanczos bidiagonalization 
process with applications, SIAM J. Sci. Comput., 21 (2000), pp. 2257-2274. 

[30] D. C. Sorensen, Implicit application of polynomial filters in a k-step Arnoldi method, SIAM 
J. Matrix Anal. Appl., 13 (1992), pp. 357-385. 

[31] G. W. Stewart, A Krylov-Schur algorithm for large eigenproblems, SIAM J. Matrix Anal. 
Appl., 23 (2001), pp. 601-614. 

[32] G. W. Stewart, Matrix Algorithms Vol.11: Eigensystems, SIAM, Philadelphia, PA, 2001. 

[33] M. Stoll, a Krylov-Schur approach to the truncated SVD, NA-08-03, Laboratory of Comput- 
ing Laboratory, Oxford University, 2008. 

[34] H. A. VAN DER VORST, Computational Methods for Large Eigenvalue Problems, In P. G. Ciarlet 
and J. L. Lions (eds.). Handbook of Numerical Analysis, Vol. VIII, North-Holland, Elsevier, 
pp. 3-179, 2002. 



